{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "5ab60f84-39b3-4bdd-ae83-6527acb315e5",
   "metadata": {},
   "source": [
    "# LLMs and LlamaIndex ◦ April 4 2024 ◦ Vector Institute Prompt Engineering Lab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2276edc2-74ce-4d85-bdaa-0607c74fc981",
   "metadata": {},
   "outputs": [],
   "source": [
    "##################################################################\n",
    "# Venue: Vector Institute Prompt Engineering Lab\n",
    "# Talk: Building RAG Systems with LlamaIndex\n",
    "# Speaker: Andrei Fajardo\n",
    "##################################################################"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bee89b8-a04d-4326-9392-b9e7e1bcb8af",
   "metadata": {},
   "source": [
    "![Title Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/title.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4d38b38-ea48-4012-81ae-84e1d1f40a69",
   "metadata": {},
   "source": [
    "![Title Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/framework.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34d1f8e7-f978-4f19-bdfb-37c0d235b5bf",
   "metadata": {},
   "source": [
    "#### Notebook Setup & Dependency Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f227a52a-a147-4e8f-b7d3-e03f983fd5f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-vector-stores-qdrant -q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7bc383fc-19b2-47b5-af61-e83210ea9c37",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d56aba9c-230b-464d-909c-de17d1bc0aa6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mkdir: data: File exists\n",
      "--2024-04-04 11:34:26--  https://vectorinstitute.ai/wp-content/uploads/2024/02/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf\n",
      "Resolving vectorinstitute.ai (vectorinstitute.ai)... 162.159.134.42\n",
      "Connecting to vectorinstitute.ai (vectorinstitute.ai)|162.159.134.42|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 5305903 (5.1M) [application/pdf]\n",
      "Saving to: ‘./data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf’\n",
      "\n",
      "./data/Vector-Annua 100%[===================>]   5.06M  14.2MB/s    in 0.4s    \n",
      "\n",
      "2024-04-04 11:34:27 (14.2 MB/s) - ‘./data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf’ saved [5305903/5305903]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir data\n",
    "!wget \"https://vectorinstitute.ai/wp-content/uploads/2024/02/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf\" -O \"./data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "275c00f1-e358-498a-88c3-8e810a5a2546",
   "metadata": {},
   "source": [
    "## Motivation\n",
    "\n",
    "![Motivation Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/motivation.excalidraw.svg)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25d4ce76-8eea-44cb-aa99-94844dfed9c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# query an LLM and ask it about Mistplay\n",
    "from llama_index.llms.openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-4-turbo-preview\")\n",
    "\n",
    "response = llm.complete(\"What is Vector Institute all about?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3f18489-4f25-40ce-86e9-697ddea7d6c6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Vector Institute is a research institution in Toronto, Canada, dedicated to the field of artificial intelligence (AI), with a particular focus on machine learning and deep learning. It was established in March 2017 as part of Canada's strategy to lead in the AI domain, leveraging the country's significant contributions to the development of AI technologies.\n",
      "\n",
      "The institute's mission revolves around several key objectives:\n",
      "\n",
      "1. **Research Excellence**: Vector aims to attract, retain, and develop top talent in the AI field. It fosters an environment where leading researchers and industry experts can collaborate on cutting-edge projects, pushing the boundaries of what AI can achieve.\n",
      "\n",
      "2. **Industry Collaboration**: The institute works closely with companies across various sectors, helping them to adopt AI technologies to drive innovation and competitiveness. This includes providing guidance on AI strategy, development, and deployment.\n",
      "\n",
      "3. **Talent Development**: Recognizing the critical need for skilled professionals in the AI field, Vector offers educational programs and training opportunities. These initiatives are designed to equip students, professionals, and leaders with the knowledge and skills necessary to excel in the rapidly evolving AI landscape.\n",
      "\n",
      "4. **Thought Leadership**: Vector contributes to the global conversation on AI by sharing insights, research findings, and best practices. It aims to influence policy and decision-making, ensuring that AI development and deployment are guided by ethical considerations and societal needs.\n",
      "\n",
      "The Vector Institute is part of a broader Canadian AI ecosystem, which includes other leading institutions like the Montreal Institute for Learning Algorithms (MILA) and the Alberta Machine Intelligence Institute (Amii). Together, these organizations help to solidify Canada's position as a world leader in AI research and innovation.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80b2f9c3-1e07-4178-9e7e-220355725955",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = llm.complete(\n",
    "    \"According to Vector Institute's Annual Report 2022-2023, \"\n",
    "    \"how many AI jobs were created in Ontario?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a1c6424-cf2f-4ee1-b601-674b03354e35",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "As of my last update in April 2023, I don't have access to the specific details of the Vector Institute's Annual Report for 2022-2023. Therefore, I cannot provide the exact number of AI jobs created in Ontario as reported in that document. For the most accurate and up-to-date information, I recommend checking the Vector Institute's official website or contacting them directly. They may have published the report or related summaries that include the data you're interested in.\n"
     ]
    }
   ],
   "source": [
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "04a0ef8d-d55c-4b64-887b-18d343503a76",
   "metadata": {},
   "source": [
    "## Basic RAG in 3 Steps\n",
    "\n",
    "![Divider Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/subheading.excalidraw.svg)\n",
    "\n",
    "\n",
    "1. Build external knowledge (i.e., uploading updated data sources)\n",
    "2. Retrieve\n",
    "3. Augment and Generate"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "598a5257-20ae-468e-85d6-d4e8c46b8cb5",
   "metadata": {},
   "source": [
    "## 1. Build External Knowledge\n",
    "\n",
    "![Divider Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step1.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2963f90-9da5-4a0d-8dbe-f16fcb8627a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Load the data.\n",
    "\n",
    "With llama-index, before any transformations are applied,\n",
    "data is loaded in the `Document` abstraction, which is\n",
    "a container that holds the text of the document.\n",
    "\"\"\"\n",
    "\n",
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "loader = SimpleDirectoryReader(input_dir=\"./data\")\n",
    "documents = loader.load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da321e2c-8428-4c04-abf2-b204416e816f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Vector Institute   |   Annual Report 2022-2023      2\\nTABLE OF CONTENTS\\nMESSAGE FROM THE BOARD CHAIR  3 \\nMESSAGE FROM THE PRESIDENT AND CEO 4\\nVECTOR’S VISION AND MISSION 5\\nONTARIO’S VIBRANT AI ECOSYSTEM 6\\nINDUSTRY INNOVATION   7\\nRESEARCH AND EDUCATION  1 4TALENT AND WORKFORCE DEVELOPMENT 24\\nHEALTH   3 1\\nAI ENGINEERING   36\\nTHOUGHT LEADERSHIP  39\\nFINANCIALS 42\\n'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if you want to see what the text looks like\n",
    "documents[1].text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4801e74a-8c52-45c4-967d-7a1a94f54ad3",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Chunk, Encode, and Store into a Vector Store.\n",
    "\n",
    "To streamline the process, we can make use of the IngestionPipeline\n",
    "class that will apply your specified transformations to the\n",
    "Document's.\n",
    "\"\"\"\n",
    "\n",
    "from llama_index.core.ingestion import IngestionPipeline\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.vector_stores.qdrant import QdrantVectorStore\n",
    "import qdrant_client\n",
    "\n",
    "client = qdrant_client.QdrantClient(location=\":memory:\")\n",
    "vector_store = QdrantVectorStore(client=client, collection_name=\"test_store\")\n",
    "\n",
    "pipeline = IngestionPipeline(\n",
    "    transformations=[\n",
    "        SentenceSplitter(),\n",
    "        OpenAIEmbedding(),\n",
    "    ],\n",
    "    vector_store=vector_store,\n",
    ")\n",
    "_nodes = pipeline.run(documents=documents, num_workers=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02afea25-098b-49c7-a965-21c7576757af",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Vector Institute   |   Annual Report 2022-2023      2\\nTABLE OF CONTENTS\\nMESSAGE FROM THE BOARD CHAIR  3 \\nMESSAGE FROM THE PRESIDENT AND CEO 4\\nVECTOR’S VISION AND MISSION 5\\nONTARIO’S VIBRANT AI ECOSYSTEM 6\\nINDUSTRY INNOVATION   7\\nRESEARCH AND EDUCATION  1 4TALENT AND WORKFORCE DEVELOPMENT 24\\nHEALTH   3 1\\nAI ENGINEERING   36\\nTHOUGHT LEADERSHIP  39\\nFINANCIALS 42'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# if you want to see the nodes\n",
    "# len(_nodes)\n",
    "_nodes[1].text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44cd8a86-089d-4329-9484-35b98b3a26f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Create a llama-index... wait for it... Index.\n",
    "\n",
    "After uploading your encoded documents into your vector\n",
    "store of choice, you can connect to it with a VectorStoreIndex\n",
    "which then gives you access to all of the llama-index functionality.\n",
    "\"\"\"\n",
    "\n",
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "286b1827-7547-49c6-aba3-82f08d6d86b8",
   "metadata": {},
   "source": [
    "## 2. Retrieve Against A Query\n",
    "\n",
    "![Step2 Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step2.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49f86af1-db08-4641-89ad-d60abd04e6b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Retrieve relevant documents against a query.\n",
    "\n",
    "With our Index ready, we can now query it to\n",
    "retrieve the most relevant document chunks.\n",
    "\"\"\"\n",
    "\n",
    "retriever = index.as_retriever(similarity_top_k=2)\n",
    "retrieved_nodes = retriever.retrieve(\n",
    "    \"According to Vector Institute's Annual Report 2022-2023, \"\n",
    "    \"how many AI jobs were created in Ontario?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06b04a79-2a28-4a21-b15b-4bb6c7b1a227",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NodeWithScore(node=TextNode(id_='af5b8613-9990-492f-8a0c-94ced4be3c61', embedding=None, metadata={'page_label': '6', 'file_name': 'Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_path': '/Users/nerdai/talks/2024/vector-pe-lab/data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_type': 'application/pdf', 'file_size': 5305903, 'creation_date': '2024-04-04', 'last_modified_date': '2024-02-12'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='47692fff-7f25-4472-846f-c3f8d30a16e5', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '6', 'file_name': 'Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_path': '/Users/nerdai/talks/2024/vector-pe-lab/data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_type': 'application/pdf', 'file_size': 5305903, 'creation_date': '2024-04-04', 'last_modified_date': '2024-02-12'}, hash='3bbac897823b34c5131c0df0923d9cc6125119814b241c192bb600a50328b432'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='a84bb9b1-771a-40b1-b923-bb5c7d18f02b', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '5', 'file_name': 'Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_path': '/Users/nerdai/talks/2024/vector-pe-lab/data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_type': 'application/pdf', 'file_size': 5305903, 'creation_date': '2024-04-04', 'last_modified_date': '2024-02-12'}, hash='31dd177bdc920cab3187722b550baa45b2b0307b78132444b54c24eb36892c3d'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='b70932b9-b9b2-4094-9e87-9cdc2ed6b90c', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='c8c0e5beea0643c8f86a849ae1c39c181029d182609af9b8dfc1a08d39c9b0b4')}, text='6\\nVector Institute   |   Annual Report 2022-2023      \\nONTARIO’S VIBRANT AI ECOSYSTEM\\nThe Vector Institute, in collaboration with Deloitte Canada, produces annual snapshots of the \\nOntario AI ecosystem, revealing its continued strength and role in nurturing top AI talent.\\nHIGHLIGHTS OF ONTARIO’S AI ECOSYSTEM\\nMore Ontario data will be available soon on the Vector Institute website . \\n20,634 \\nAI jobs created in Ontario75,975 \\nAI jobs retained in Ontario*1,503 \\nnew AI master’s and study \\npath enrollments1,050\\nnew AI master’s graduates from \\nVector-recognized programs248\\nnew AI-related patents filed\\nacross Canada\\n$1.16 BILLION\\nin AI-related Ontario \\nVC investments312\\ncompanies investe\\nd in the \\nOntario AI ecosystem23% \\nincrease in AI R&D spending or budgets*27\\nnew AI companies established in Ontario68%\\nof CEOs surveyed reported adopting AI solutions*\\n*In this report, Deloitte refined the methodology to improve accuracy of this calculation as well as validated it against other external data sources.', start_char_idx=0, end_char_idx=1009, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.9154188120411539),\n",
       " NodeWithScore(node=TextNode(id_='6af722ec-7d4a-40a9-80a0-d0c7919c5fd1', embedding=None, metadata={'page_label': '24', 'file_name': 'Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_path': '/Users/nerdai/talks/2024/vector-pe-lab/data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_type': 'application/pdf', 'file_size': 5305903, 'creation_date': '2024-04-04', 'last_modified_date': '2024-02-12'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='a51fcb51-0caa-44a8-ac08-46c0f9781b28', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '24', 'file_name': 'Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_path': '/Users/nerdai/talks/2024/vector-pe-lab/data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_type': 'application/pdf', 'file_size': 5305903, 'creation_date': '2024-04-04', 'last_modified_date': '2024-02-12'}, hash='2be0f87269fa3f0366e32912eb84d70c2e99facbfb9b15a32a59786d3f04c4dc'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='d10dbc11-c5e3-445e-aa74-73bb7c44f7e5', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '23', 'file_name': 'Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_path': '/Users/nerdai/talks/2024/vector-pe-lab/data/Vector-Annual-Report-2022-23_accessible_rev0224-1.pdf', 'file_type': 'application/pdf', 'file_size': 5305903, 'creation_date': '2024-04-04', 'last_modified_date': '2024-02-12'}, hash='efbd640facc631c0c4ff11a69c807b2edf1204b53f3bcb9ca19afad9c7d5abab'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='d21e5d50-1020-4491-987e-fec51f1c2f60', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='1978c85f73b84bc6fdce8b89f5475b65392371816a70973d65b3a6f5fd57e0c1')}, text='24\\nVector Institute   |   Annual Report 2022-2023      \\nTALENT AND WORKFORCE DEVELOPMENT\\nVECTOR’S CRUCIAL ROLE IN ADVANCING \\nONTARIO’S ECONOMY\\nVector works with both universities and employers to address the \\nincreasing demand for AI expertise by creating channels for promising students to access degree programs that cultivate the most sought-after AI skills in industry. \\nThrough Vector’s initiatives and experiential learning opportunities, \\nstudents learn crucial AI technical and job-related competencies, make connections with leading employers, and contribute to the constructive hiring outcomes that are reshaping Ontario’s economy. \\nAs of March 31, 2023, more than 1,325 students from \\nVector-recognized programs and AI study paths, including Vector Institute Scholarship in Artificial In elligence (VSAI) recipients, have been hired by Ontario employers.\"Ontario is a strong player in the field of artificial \\nintelligence, technology, and research, and we are thrilled to see so many new graduates looking to invest their skills and ideas in the province.\\n\"\\n-Vic Fedeli, Minister of Economic Development, Job Creationand Trade of Ontario*\\n*Quote source', start_char_idx=0, end_char_idx=1164, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.9083415588705721)]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# to view the retrieved nodes\n",
    "retrieved_nodes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "978ae2c5-8c2a-41c7-a2eb-85a5562f2db5",
   "metadata": {},
   "source": [
    "## 3. Generate Final Response\n",
    "\n",
    "![Step3 Image](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/step3.excalidraw.svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef33c349-eed4-4e35-9b5d-9473adf2ce01",
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Context-Augemented Generation.\n",
    "\n",
    "With our Index ready, we can create a QueryEngine\n",
    "that handles the retrieval and context augmentation\n",
    "in order to get the final response.\n",
    "\"\"\"\n",
    "\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4139c48a-ece8-4244-b4eb-7cff74cb1325",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Context information is below.\n",
      "---------------------\n",
      "{context_str}\n",
      "---------------------\n",
      "Given the context information and not prior knowledge, answer the query.\n",
      "Query: {query_str}\n",
      "Answer: \n"
     ]
    }
   ],
   "source": [
    "# to inspect the default prompt being used\n",
    "print(\n",
    "    query_engine.get_prompts()[\n",
    "        \"response_synthesizer:text_qa_template\"\n",
    "    ].default_template.template\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6179639d-af96-4a09-b440-b47ad599a26f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "20,634 AI jobs were created in Ontario according to Vector Institute's Annual Report 2022-2023.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"According to Vector Institute's Annual Report 2022-2023, \"\n",
    "    \"how many AI jobs were created in Ontario?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a050bb7f-266c-46c7-8773-eccb909e5fd6",
   "metadata": {},
   "source": [
    "## More Queries"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20be9033-f240-4505-9761-29e9455c8974",
   "metadata": {},
   "source": [
    "### Comparisons"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c86e99c2-601e-443e-aed2-ebe25106a966",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "27\n"
     ]
    }
   ],
   "source": [
    "query = (\n",
    "    \"According to Vector Institute's 2022-2023 annual report, \"\n",
    "    \"how many new AI companies were established in Ontario?\"\n",
    ")\n",
    "\n",
    "response = query_engine.query(query)\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b02d9ae9-42ed-4f6e-8c2f-709d7e97dd22",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "In the years 2022 and 2023, the dollar value for Unrestricted net assets was $51,630,471 and $52,937,983, respectively.\n"
     ]
    }
   ],
   "source": [
    "query = (\n",
    "    \"According to Vector Institute's 2022-2023 annual report, \"\n",
    "    \"what was the dollar value for Unrestricted net assets in \"\n",
    "    \"the years 2022 & 2023?\"\n",
    ")\n",
    "\n",
    "response = query_engine.query(query)\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5bafbb1e-96e7-4a9f-909f-e7bfffd76532",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "According to Vector Institute's 2022-2023 annual report, the platinum sponsors were BMO Financial Group, Google, Loblaw Companies Ltd., NVIDIA, RBC, Scotiabank, Shopify Inc., and TD Bank Group.\n"
     ]
    }
   ],
   "source": [
    "query = (\n",
    "    \"According to Vector Institute's 2022-2023 annual report, \"\n",
    "    \"what companies were platinum sponsors?\"\n",
    ")\n",
    "\n",
    "response = query_engine.query(query)\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dae63946-be38-4807-af2a-8113661a806b",
   "metadata": {},
   "source": [
    "## In Summary\n",
    "\n",
    "- LLMs as powerful as they are, don't perform too well with knowledge-intensive tasks (domain specific, updated data, long-tail)\n",
    "- Context augmentation has been shown (in a few studies) to outperform LLMs without augmentation\n",
    "- In this notebook, we showed one such example that follows that pattern."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53bfd859-924a-4748-b4d8-ee0f2dc52e3b",
   "metadata": {},
   "source": [
    "## Storing Results In Structured Objects"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01fb77fd-f4e8-42b0-9704-d2229f4e66a8",
   "metadata": {},
   "source": [
    "![DataExtractions](https://media.licdn.com/dms/image/D4E22AQGwPmZ5RRhbyA/feedshare-shrink_1280/0/1711823067172?e=1715212800&v=beta&t=fJtksPZ3Fm-BOrKRCwa6BrYyuxlNFDuop3ZSopMl53M)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a63649cf-7f3a-4f6d-a383-92163f71b03c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from llama_index.core.bridge.pydantic import BaseModel, Field\n",
    "from typing import List\n",
    "\n",
    "from llama_index.program.openai import OpenAIPydanticProgram\n",
    "from llama_index.llms.openai import OpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f306da0-4da6-45b5-b9b3-50d8d0b9883a",
   "metadata": {},
   "source": [
    "### Sponsors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b37bc7e7-4ee8-40e8-891f-0307997049a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "class VectorSponsors(BaseModel):\n",
    "    \"\"\"Data model for Vector Institute sponsors 2022-2023.\"\"\"\n",
    "\n",
    "    platinum: str = Field(description=\"Platinum sponsors\")\n",
    "    gold: str = Field(description=\"Gold sponsors\")\n",
    "    silver: str = Field(description=\"Silver sponsors\")\n",
    "    bronze: List[str] = Field(description=\"Bronze sponsors\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec28d600-5001-4176-bd9f-5982c05d02a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Here is the 2022-2023 Annual Report for Vector Institute:\n",
    "{document_text}\n",
    "Provide the names sponsor companies according to their tiers.\n",
    "\"\"\"\n",
    "\n",
    "program = OpenAIPydanticProgram.from_defaults(\n",
    "    output_cls=VectorSponsors,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    llm=OpenAI(\"gpt-4-turbo-preview\"),\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf0b8186-3e5f-4690-a74c-6be28f1f3d74",
   "metadata": {},
   "outputs": [],
   "source": [
    "report_as_text = \"\"\n",
    "for d in documents:\n",
    "    report_as_text += d.text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0de616a2-c96a-4be0-b8ba-04e537d03e61",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Function call: VectorSponsors with args: {\"platinum\":\"BMO Financial Group, Google, Loblaw Companies Ltd., NVIDIA, RBC, Scotiabank, Shopify Inc., TD Bank Group, Thomson Reuters\",\"gold\":\"Accenture, Air Canada, Bell Canada, Boehringer Ingelheim (Canada) Ltd., Canadian Tire Corporation, Ltd., CIBC, CN, Deloitte Canada, EY Canada, Georgian, KPMG Canada, KT Corporation, Magna International, OMERS, PwC Canada, Roche Canada, Sun Life Financial, TELUS, Thales Canada\",\"silver\":\"EllisDon Corporation, Linamar Corporation\",\"bronze\":[\"Ada\",\"ALS\",\"GoldSpot Discoveries Ltd.\",\"AltaML\",\"Avidbots\",\"BenchSci\",\"Blue J\",\"Canvass Analytics Inc.\",\"Clearpath\",\"Cohere\",\"Cyclica\",\"Darwin AI\",\"Deep Genomics\",\"FreshBooks\",\"Integrate.ai\",\"Layer 6\",\"League\",\"MindBridge Analytics Inc.\",\"Private AI\",\"Riskfuel\",\"Shakudo\",\"Signal 1\",\"Stradigi AI\",\"Surgical Safety Technologies\",\"TealBook\",\"Troj.AI\",\"Wysdom AI\"]}\n"
     ]
    }
   ],
   "source": [
    "sponsors = program(document_text=report_as_text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41f8b4b0-0f19-4101-9642-1b5bc555dfac",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"platinum\": \"BMO Financial Group, Google, Loblaw Companies Ltd., NVIDIA, RBC, Scotiabank, Shopify Inc., TD Bank Group, Thomson Reuters\",\n",
      "    \"gold\": \"Accenture, Air Canada, Bell Canada, Boehringer Ingelheim (Canada) Ltd., Canadian Tire Corporation, Ltd., CIBC, CN, Deloitte Canada, EY Canada, Georgian, KPMG Canada, KT Corporation, Magna International, OMERS, PwC Canada, Roche Canada, Sun Life Financial, TELUS, Thales Canada\",\n",
      "    \"silver\": \"EllisDon Corporation, Linamar Corporation\",\n",
      "    \"bronze\": [\n",
      "        \"Ada\",\n",
      "        \"ALS\",\n",
      "        \"GoldSpot Discoveries Ltd.\",\n",
      "        \"AltaML\",\n",
      "        \"Avidbots\",\n",
      "        \"BenchSci\",\n",
      "        \"Blue J\",\n",
      "        \"Canvass Analytics Inc.\",\n",
      "        \"Clearpath\",\n",
      "        \"Cohere\",\n",
      "        \"Cyclica\",\n",
      "        \"Darwin AI\",\n",
      "        \"Deep Genomics\",\n",
      "        \"FreshBooks\",\n",
      "        \"Integrate.ai\",\n",
      "        \"Layer 6\",\n",
      "        \"League\",\n",
      "        \"MindBridge Analytics Inc.\",\n",
      "        \"Private AI\",\n",
      "        \"Riskfuel\",\n",
      "        \"Shakudo\",\n",
      "        \"Signal 1\",\n",
      "        \"Stradigi AI\",\n",
      "        \"Surgical Safety Technologies\",\n",
      "        \"TealBook\",\n",
      "        \"Troj.AI\",\n",
      "        \"Wysdom AI\"\n",
      "    ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(json.dumps(sponsors.dict(), indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17c1c027-be8b-48f4-87ee-06f3e2c71797",
   "metadata": {},
   "source": [
    "### Useful links\n",
    "\n",
    "[website](https://www.llamaindex.ai/) ◦ [llamahub](https://llamahub.ai) ◦ [llamaparse](https://github.com/run-llama/llama_parse) ◦ [github](https://github.com/run-llama/llama_index) ◦ [medium](https://medium.com/@llama_index) ◦ [rag-bootcamp-poster](https://d3ddy8balm3goa.cloudfront.net/rag-bootcamp-vector/final_poster.excalidraw.svg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "vector-pe",
   "language": "python",
   "name": "vector-pe"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
